Final Project
Group 1:
Nathania Stephens
Hiba Awan
Abstract
Introduction & Background
Motivation/ Purpose
Goals/ Objectives
Data
Overview
About the Data
Three datasets were used from to acquire arrest, citations, and warnings in the year 2023 from the Fairfax County Policy Department. For simplicity general definition are provided:
Arrest - When a person is taken into custody to answer for an offense or when there is a deprivation or restraint of a person’s liberty in any significant way.
Citation - Formal notice issued by law enforcement officer for a violation of law, typically related to traffic laws or other minor offenses. Typically requiring a violator to appear in court or pay a fine.
Warning - When a violation, typically minor, has been made but an officer issues a warning rather than a citation.
The following attributes were key to the research and conducted:
| Column Name | Data Type | Description |
|---|---|---|
| Date | Date | Date of offense |
| Time | Chr | 123 |
| Offense | 1 | 1 |
Limitations and Assumptions
Cleaning and Transformation
Exploratory Analysis
Mapping the arrest data for a geospatial visual of where arrest occur.
Next we look at the Top 10 Arrest Type by Incident Based Reporting (IBR) codes.

Next examining the Top 10 Citations
Warning Versus Citation Next an examination of warning versus citation will be observed… This will help understand what different factors could play into getting a warning or a citation.
library(readr)
library(lubridate)
warnings = read_csv("2023_warning_data.csv",
col_types = cols(Warnings_Date = col_date(format = "%m/%d/%Y"),
WEB_ADDRESS = col_skip(), PHONE_NUMBER = col_skip(),
NAME = col_skip()))
citations = read_csv("2023_citation_data.csv",
col_types = cols(Date = col_date(format = "%m/%d/%Y"),
WEB_ADDRESS = col_skip(), PHONE_NUMBER = col_skip(),
NAME = col_skip()))
# Rename some columns
citations = citations %>%
rename(ViolationDate = Date)
# change Gender to sex in warnings and change date column name
warnings = warnings %>%
rename(Sex = Gender)
warnings = warnings %>%
rename(ViolationDate = Warnings_Date)
# Adjust Citations and prepare for Merge
# Assumption that ID is the officer's ID
citations_processed = citations %>%
mutate(
outcome = "Citation",
Gender = case_when(
Sex == "M" ~ "Male",
Sex == "F" ~ "Female",
TRUE ~ "Other/Unknown"
),
Year = year(ViolationDate),
Month = month(ViolationDate),
DayOfMonth = day(ViolationDate),
Time = parse_date_time(Time, "HM"),
data_type = "Citation"
) %>%
select(
outcome, Gender, Year, Month, DayOfMonth, Time, Offense_Description = Charge,
District = DISTRICT, Race, Ethnicity, Latitude, Longitude, OfficerID = ID, data_type
)
# Adjust Warnings and prepare for Merge
warnings_processed = warnings %>%
mutate(
outcome = "Warning",
Gender = case_when(
Sex == "M" ~ "Male",
Sex == "F" ~ "Female",
TRUE ~ "Other/Unknown"
),
Year = year(ViolationDate),
Month = month(ViolationDate),
DayOfMonth = day(ViolationDate),
Time = parse_date_time(Time, "HM"),
data_type = "Warning"
) %>%
select(
outcome, Gender, Year, Month, DayOfMonth, Time, Offense_Description, District = DISTRICT, Race,
Ethnicity, Latitude = Lat, Longitude = Long, OfficerID = Officer_ID, data_type
)
# Combined for ultimate Data coordination!
combined_wc = bind_rows(citations_processed, warnings_processed)
# Add ultimate binary outcome! 0 = Citation, 1 = Warning/ Got out of ticket
combined_wc = combined_wc %>%
mutate(
BinaryOutcome = ifelse(outcome == "Warning", 1,0)
)
## Change to Title Case for District Names
combined_wc$District = tools::toTitleCase(tolower(combined_wc$District))
## Examining Unverified data
## After examination, unverified only makes up 0.0143 or 1.43% of the data set, so we will remove
## because it is a very small portion of the total proportion.
combined_wc %>%
count(District) %>%
mutate(Proportion = n / sum(n)) %>%
arrange(desc(n))# A tibble: 11 × 3
District n Proportion
<chr> <int> <dbl>
1 Sully 18612 0.208
2 Springfield 12581 0.140
3 Braddock 10292 0.115
4 Franconia 10033 0.112
5 Hunter Mill 8718 0.0972
6 Mason 8168 0.0911
7 Dranesville 7143 0.0797
8 Providence 6713 0.0749
9 Mount Vernon 6113 0.0682
10 Unverified 1281 0.0143
11 <NA> 1 0.0000112
## Filter out Unverified and NA
combined_wc = combined_wc %>%
filter(District != "Unverified")
combined_wc = combined_wc %>%
filter(!is.na(District))
## Filter out Other/Unknown Gender
combined_wc_mf = combined_wc %>%
filter(Gender != "Other/Unknown")
## Now for some visuals: Gender Chart
## Examining the proportion of stops resulting in a Warning Vs Citation
## the Warning rate is the proportion of incidents that are warnings.
gender_warning_rate = combined_wc_mf %>%
group_by(Gender) %>%
summarise(
Total_Incidents = n(),
Warning_Rate = mean(BinaryOutcome)
) %>%
ungroup()
gender_chart = ggplot(gender_warning_rate,
aes(x = Gender, y = Warning_Rate, fill = Gender)) +
geom_col(show.legend = FALSE) +
geom_text(aes(label = scales::percent(Warning_Rate, accuracy = 0.1)),
vjust = -0.5, size = 5) +
scale_y_continuous(labels = scales::percent, limits = c(0, max(gender_warning_rate$Warning_Rate) * 1.1)) +
labs(
title = "Warning Rate by Gender",
subtitle = "Proportion of stops resulting in a Warning (vs Citation)",
x = "Gender",
y = "Warning Rate"
) + theme_gray() + theme(plot.title = element_text(hjust = 0.5)) + theme(plot.subtitle = element_text(hjust = 0.5)) +
scale_fill_manual(values = c("Female" = "pink", "Male" = "skyblue"))
gender_chart
## Now the Chi-Squared Test starting with the Contingency Table
contingency_tbl = combined_wc_mf %>%
filter(Gender %in% c("Male", "Female")) %>%
select(Gender, BinaryOutcome) %>%
table()
contingency_tbl BinaryOutcome
Gender 0 1
Female 20478 8777
Male 43657 15408
chi_sq_results = chisq.test(contingency_tbl)
chi_sq_results
Pearson's Chi-squared test with Yates' continuity correction
data: contingency_tbl
X-squared = 150.62, df = 1, p-value < 2.2e-16
Research Questions
Is there an association between gender and warnings?
Are there other factors that determine if someone gets out of a “ticket”? OR Are you more likely to get a ticket at the end of the month (some believe that police officers have a monthly quota)